Skip to content

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Nov 27, 2025

Overview

Resolves #2609

The basic idea is to store an additional DAHightHint field within the p2p store.

As SignedHeader and Data are used in other places, too. I added an an DAHeightHintContainer type to wrap the tuple for the store only.
The DAHeight Hint is added by the da_submitter and read in the syncer to fetch the missing DA header/data for the most recent block as required.

Please note: this is a breaking change to the p2p network and store

@claude
Copy link
Contributor

claude bot commented Nov 27, 2025

Claude finished @alpe's task —— View job


Comprehensive Code Review

I've completed a thorough review of this PR implementing DA hints in the p2p layer. Here's my detailed analysis:


🏗️ Architecture & Design

✅ Strengths

1. Clean Abstraction with DAHeightHintContainer

  • The generic wrapper pattern (DAHeightHintContainer[H header.Header[H]]) elegantly adds DA hint metadata without modifying core SignedHeader and Data types
  • Proper implementation of the header.Header interface via delegation maintains compatibility with go-header library
  • Type aliases (SignedHeaderWithDAHint, DataWithDAHint) improve readability

2. Solid Integration Points

  • DA submitter correctly updates hints after successful submission (lines 201, 250 in da_submitter.go)
  • Syncer efficiently uses hints to trigger targeted DA retrieval via async worker pool
  • P2P handler properly extracts and passes hints through the event pipeline

3. Performance Optimization

  • AsyncDARetriever with worker pool (5 workers) prevents blocking on DA lookups
  • Deduplication via inFlight map prevents redundant DA fetches
  • Non-blocking request queueing with buffered channels (size 100)

⚠️ Concerns

1. Breaking Change Without Migration Strategy ⚠️ CRITICAL

// pkg/sync/sync_service.go:107
ss, err := goheaderstore.NewStore[*DAHeightHintContainer[V]](store, ...)

The p2p store now exchanges DAHeightHintContainer instead of raw headers/data. This breaks:

  • Network protocol: Old nodes cannot communicate with upgraded nodes
  • Persistent storage: Existing p2p stores cannot deserialize new format
  • Zero downtime upgrades: Network-wide coordination required

Recommendations:

  1. Implement a migration path for existing p2p stores
  2. Consider dual-mode operation (broadcast both formats temporarily)
  3. Document upgrade procedure clearly for operators
  4. Add version negotiation or capability detection in p2p handshake

2. Missing Validation in DAHeightHintContainer

// pkg/sync/da_hint_container.go:75-81
func (s *DAHeightHintContainer[H]) UnmarshalBinary(data []byte) error {
    if len(data) < 8 {
        return fmt.Errorf("invalid length: %d", len(data))
    }
    s.DAHeightHint = binary.BigEndian.Uint64(data)
    return s.Entry.UnmarshalBinary(data[8:])
}
  • No validation that DAHeightHint is reasonable (e.g., not zero when it should be set)
  • No bounds checking (could be far in the future)
  • Consider adding sanity checks or making zero explicitly valid

3. Silent Error Handling

// block/internal/submitting/da_submitter.go:201-204
if err := s.headerDAHintAppender.AppendDAHint(ctx, res.Height, hashes...); err != nil {
    s.logger.Error().Err(err).Msg("failed to append da height hint in header p2p store")
    // ignoring error here, since we don't want to block the block submission'
}

While pragmatic, silently ignoring store update failures could lead to:

  • Missing hints that force sequential DA scanning
  • Degraded performance that's hard to diagnose
  • Consider metrics or alerting for hint append failures

🔒 Security & Correctness

✅ Good Practices

  1. Input Validation: P2P handler validates proposer address before processing (line 89, p2p_handler.go)
  2. Context Propagation: Proper context handling throughout async operations
  3. No Credential Exposure: DA hints are metadata, no security-sensitive data

⚠️ Issues

1. Race Condition in P2PHandler

// block/internal/syncing/p2p_handler.go:63-71
func (h *P2PHandler) SetProcessedHeight(height uint64) {
    for range 1_000 {
        current := h.processedHeight.Load()
        if height <= current {
            return
        }
        if h.processedHeight.CompareAndSwap(current, height) {
            return
        }
    }
}

The 1000-iteration loop is unusual. While it handles contention, it:

  • Could theoretically fail silently after 1000 iterations
  • Suggests heavy contention that should be investigated
  • Consider: Does this indicate a design issue? Should concurrent updates be synchronized differently?

2. Potential Nil Pointer Dereference

// block/internal/syncing/syncer.go:459-497
if event.Source == common.SourceP2P {
    // ... uses event.DaHeightHints without nil check

DaHeightHints is a fixed-size array [2]uint64, so safe. However, event.Data and event.Header are pointers used throughout without defensive checks. Consider adding validation at entry points.


🧪 Testing

✅ Strengths

  1. Comprehensive Coverage:

    • Unit tests for DAHeightHintContainer marshaling/unmarshaling
    • Integration tests for hint storage and retrieval across restarts
    • Tests for async DA retriever deduplication and worker pool limits
  2. Good Test Patterns:

    • Table-driven tests for retry logic
    • Mock-based isolation for complex dependencies
    • Tests verify behavior across service restarts

⚠️ Gaps

1. Missing Migration Tests

  • No tests for upgrading from old store format to new format
  • No tests for handling legacy data during migration

2. Edge Cases Not Covered

// What happens if:
// - DA hint is zero (valid or invalid?)
// - Hints differ for header vs data of same block
// - Hint points to non-existent DA height
// - Async DA retriever channel is full

3. Test Coverage at 76.79%
From codecov report, key files with missing coverage:

  • syncer.go: 12 lines missing (lines 459-497 DA hint logic)
  • sync_service.go: 11 lines missing
  • da_hint_container.go: 8 lines missing
  • async_da_retriever.go: 7 lines missing

Recommendation: Add tests for error paths and edge cases in hint handling logic.


⚡ Performance & Concurrency

✅ Good Patterns

  1. Async DA Retrieval: Worker pool design prevents blocking main sync loop
  2. Buffered Channels: Appropriate buffer sizes (100 for work queue, 100 for height events)
  3. Deduplication: inFlight map prevents duplicate requests

⚠️ Observations

1. Channel Sizing

// block/internal/syncing/async_da_retriever.go:33
workCh: make(chan uint64, 100), // Buffer size 100

Buffer size of 100 seems reasonable, but:

  • What's the expected hint rate during sync?
  • Consider making this configurable or documenting the sizing rationale

2. Worker Pool Size

// block/internal/syncing/async_da_retriever.go:43
for i := 0; i < 5; i++ {  // 5 workers

Fixed at 5 workers. Consider:

  • Making this configurable based on DA layer characteristics
  • Monitoring worker utilization to tune this number

3. Marshaling Concurrency

// block/internal/submitting/da_submitter.go:549-574
sem := make(chan struct{}, 32)  // 32 concurrent marshalers

Good use of semaphore pattern for concurrent marshaling. The limit of 32 is reasonable for CPU-bound operations.


📝 Code Quality & Maintainability

✅ Strengths

  1. Clear Documentation: Functions have good comments explaining purpose
  2. Logging: Appropriate logging levels (debug for hints, error for failures)
  3. Error Wrapping: Consistent error context with fmt.Errorf

⚠️ Improvements

1. Magic Numbers

// Multiple places with unexplained constants
workCh: make(chan uint64, 100)      // Why 100?
heightInCh: make(chan common.DAHeightEvent, 100)  // Why 100?
for range 1_000 {  // Why 1000 iterations?

Consider defining as named constants with comments explaining rationale.

2. Code Duplication
The DA hint append pattern is duplicated for headers and data:

// da_submitter.go:201-204 and 250-253
if err := s.headerDAHintAppender.AppendDAHint(...); err != nil {
    s.logger.Error().Err(err).Msg("failed to append da height hint...")
}

Consider extracting to a helper function.

3. Interface Segregation

// block/internal/common/expected_interfaces.go:18-22
type Broadcaster[H header.Header[H]] interface {
    WriteToStoreAndBroadcast(ctx context.Context, payload H, opts ...pubsub.PubOpt) error
    AppendDAHint(ctx context.Context, daHeight uint64, hashes ...types.Hash) error
    GetByHeight(ctx context.Context, height uint64) (H, uint64, error)
}

Good interface design. The addition of AppendDAHint and modified GetByHeight (now returns hint) are clean changes.


🎯 Specific File Comments

pkg/sync/da_hint_container.go

Line 60-62: IsZero Implementation

func (s *DAHeightHintContainer[H]) IsZero() bool {
    return s == nil
}

This only checks if the container is nil, not if the entry is zero. Should it also check s.Entry.IsZero()? Current implementation matches usage pattern, but naming might be misleading.

Line 69-72: MarshalBinary Efficiency

out := make([]byte, 8+len(bz))
binary.BigEndian.PutUint64(out, s.DAHeightHint)
copy(out[8:], bz)

Good: Pre-allocates exact size. Consider: Would binary.Write be clearer? Current approach is more efficient.

block/internal/syncing/syncer.go

Lines 459-497: DA Hint Logic Complexity
The nested switch/case logic for handling hint combinations is complex:

switch {
case event.DaHeightHints == [2]uint64{0, 0}:
case event.DaHeightHints[0] == 0:
case event.DaHeightHints[1] == 0:
default:
  • Logic is correct but hard to follow
  • Consider extracting to a method like extractMissingDAHints(event) []uint64
  • Add comments explaining the four cases

Lines 486-495: Async Retrieval Trigger

if len(daHeightHints) > 0 {
    for _, daHeightHint := range daHeightHints {
        s.asyncDARetriever.RequestRetrieval(daHeightHint)
    }
}

Good: Non-blocking request. The worker pool handles backpressure.

block/internal/syncing/async_da_retriever.go

Lines 61-76: Request Deduplication

func (r *AsyncDARetriever) RequestRetrieval(height uint64) {
    r.mu.Lock()
    defer r.mu.Unlock()
    
    if _, exists := r.inFlight[height]; exists {
        return
    }
    // ...
}

Solid pattern. Lock scope is minimal. Consider: Could this be optimized with sync.Map for read-heavy workloads? Current approach is likely fine given the expected request rate.


🔧 Recommendations

Critical (Must Address)

  1. Migration Strategy: Develop and document upgrade path for existing networks

    • Store migration script
    • Dual-mode broadcast period
    • Version negotiation
  2. Test Coverage: Add tests for:

    • Store migration from old format
    • DA hint edge cases (zero, invalid, far future)
    • Error paths in hint append logic
  3. Documentation: Update docs with:

    • Breaking change notice
    • Upgrade procedure
    • Rollback plan

Important (Should Address)

  1. Error Handling: Consider metrics/alerting for silent hint append failures
  2. Validation: Add bounds checking for DA hints in unmarshal
  3. Code Clarity: Extract complex hint extraction logic to named method

Nice to Have

  1. Configuration: Make worker pool size and channel buffers configurable
  2. Constants: Define magic numbers as named constants
  3. Monitoring: Add metrics for async DA retriever utilization

📊 Overall Assessment

Code Quality: ⭐⭐⭐⭐ (4/5)

  • Well-structured, clean abstractions
  • Good concurrency patterns
  • Solid test coverage for new code

Architecture: ⭐⭐⭐⭐ (4/5)

  • Smart design that avoids modifying core types
  • Effective use of async patterns for performance
  • Clean separation of concerns

Production Readiness: ⭐⭐⭐ (3/5)

  • Blocking Issue: Missing migration strategy for breaking changes
  • Needs documentation for upgrade path
  • Should add monitoring for hint failures

Performance: ⭐⭐⭐⭐⭐ (5/5)

  • Excellent async design
  • Proper deduplication
  • Non-blocking operations

✅ Conclusion

This is a well-implemented feature with solid engineering. The DA hint mechanism effectively optimizes block synchronization by enabling targeted DA lookups instead of sequential scanning. The code quality is high with good abstractions, concurrency patterns, and test coverage.

However, the PR has one critical blocker: the lack of a migration strategy for the breaking p2p network and store changes. As noted by reviewers @tac0turtle, @julienrbrt, and @alpe, this needs to be addressed before merging.

I recommend:

  1. Implementing a migration path for existing stores
  2. Considering @julienrbrt's suggestion of broadcasting both formats during transition
  3. Adding comprehensive documentation for network upgrades
  4. Including migration tests in the test suite

Once the migration strategy is in place, this PR will be ready for production use.


@github-actions
Copy link
Contributor

github-actions bot commented Nov 27, 2025

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedDec 8, 2025, 4:27 PM

@alpe alpe changed the title Alex/hints feat: DA hints in p2p Nov 28, 2025
@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 76.79558% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.81%. Comparing base (79310d0) to head (e3336ce).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
block/internal/syncing/syncer.go 58.62% 12 Missing ⚠️
pkg/sync/sync_service.go 68.57% 7 Missing and 4 partials ⚠️
pkg/sync/da_hint_container.go 77.77% 6 Missing and 2 partials ⚠️
block/internal/syncing/async_da_retriever.go 86.53% 6 Missing and 1 partial ⚠️
block/internal/submitting/da_submitter.go 80.95% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2891      +/-   ##
==========================================
+ Coverage   65.68%   65.81%   +0.13%     
==========================================
  Files          87       89       +2     
  Lines        7932     8077     +145     
==========================================
+ Hits         5210     5316     +106     
- Misses       2154     2188      +34     
- Partials      568      573       +5     
Flag Coverage Δ
combined 65.81% <76.79%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alpe added 3 commits November 28, 2025 17:20
* main:
  refactor: omit unnecessary reassignment (#2892)
  build(deps): Bump the all-go group across 5 directories with 6 updates (#2881)
  chore: fix inconsistent method name in retryWithBackoffOnPayloadStatus comment (#2889)
  fix: ensure consistent network ID usage in P2P subscriber (#2884)
cache.SetHeaderDAIncluded(headerHash.String(), res.Height, header.Height())
hashes[i] = headerHash
}
if err := s.headerDAHintAppender.AppendDAHint(ctx, res.Height, hashes...); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the DA height is passed to the sync service to update the p2p store

Msg("P2P event with DA height hint, triggering targeted DA retrieval")

// Trigger targeted DA retrieval in background via worker pool
s.asyncDARetriever.RequestRetrieval(daHeightHint)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the "fetch from DA" is triggered for the current block event height

type SignedHeaderWithDAHint = DAHeightHintContainer[*types.SignedHeader]
type DataWithDAHint = DAHeightHintContainer[*types.Data]

type DAHeightHintContainer[H header.Header[H]] struct {
Copy link
Contributor Author

@alpe alpe Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a data container to persist the DA hint together with the block header or data.
types.SignedHeader and types.Data are used all over the place so I did not modify them but added introduced this type for the p2p store and transfer only.

It may make sense to do make this a Proto type. WDYT?

return nil
}

func (s *SyncService[V]) AppendDAHint(ctx context.Context, daHeight uint64, hashes ...types.Hash) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stores the DA height hints

@alpe alpe marked this pull request as ready for review December 1, 2025 09:32
@tac0turtle
Copy link
Contributor

if da hint is not in the proto how do other nodes get knowledge of the hint?

also how would an existing network handle using this feature? its breaking so is it safe to upgrade?

"github.com/evstack/ev-node/block/internal/cache"
"github.com/evstack/ev-node/block/internal/common"
"github.com/evstack/ev-node/block/internal/da"
coreda "github.com/evstack/ev-node/core/da"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: gci linter

Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It really makes sense.

I share the same concern as @tac0turtle however about the upgrade strategy given it is p2p breaking.

julienrbrt
julienrbrt previously approved these changes Dec 2, 2025
@alpe
Copy link
Contributor Author

alpe commented Dec 2, 2025

if da hint is not in the proto how do other nodes get knowledge of the hint?

The sync_service wraps the header/data payload in a DAHeightHintContainer object that is passed upstream to the p2p layer. When the DA height is known, the store is updated.

also how would an existing network handle using this feature? its breaking so is it safe to upgrade?

It is a breaking change. Instead of signed header or data types, the p2p network exchanges DAHeightHintContainer. This would be incompatible. Also the existing p2p stores would need migration to work.

@julienrbrt
Copy link
Member

julienrbrt commented Dec 4, 2025

Could we broadcast both until every networks are updated? Then for final we can basically discard the previous one.

@alpe
Copy link
Contributor Author

alpe commented Dec 5, 2025

fyi: This PR is missing a migration strategy for the p2p store ( and ideally network)

* main:
  refactor(sequencers): persist prepended batch (#2907)
  feat(evm): add force inclusion command (#2888)
  feat: DA client, remove interface part 1: copy subset of types needed for the client using blob rpc. (#2905)
  feat: forced inclusion (#2797)
  fix: fix and cleanup metrics (sequencers + block) (#2904)
  build(deps): Bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /docs in the npm_and_yarn group across 1 directory (#2900)
  refactor(block): centralize timeout in client (#2903)
  build(deps): Bump the all-go group across 2 directories with 3 updates (#2898)
  chore: bump default timeout (#2902)
  fix: revert default db (#2897)
  refactor: remove obsolete // +build tag (#2899)
  fix:da visualiser namespace  (#2895)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sync: P2P should provide da inclusion hints

4 participants